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Abstract 

In this paper we study the problem of tracking an object moving randomly through a network of 
wireless sensors. Our objective is to devise strategies for scheduling the sensors to optimize the tradeoff 
between tracking performance and energy consumption. We cast the scheduling problem as a Partially 
Observable Markov Decision Process (POMDP), where the control actions correspond to the set of 
sensors to activate at each time step. Using a bottom-up approach, we consider different sensing, motion 
and cost models with increasing levels of difficulty. At the first level, the sensing regions of the different 
sensors do not overlap and the target is only observed within the sensing range of an active sensor. Then, 
we consider sensors with overlapping sensing range such that the tracking error, and hence the actions 
of the different sensors, are tightly coupled. Finally, we consider scenarios wherein the target locations 
and sensors' observations assume values on continuous spaces. Exact solutions are generally intractable 
even for the simplest models due to the dimensionality of the information and action spaces. Hence, 
we devise approximate solution techniques, and in some cases derive lower bounds on the optimal 
tradeoff curves. The generated scheduling policies, albeit suboptimal, often provide close-to-optimal 
energy-tracking tradeoffs. 

I. Introduction 

In large networks of inexpensive sensors with small batteries, the sensor nodes are required to operate 
on limited energy budgets. Sensor management can prolong the lifetime of a sensor network and conserve 
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scarce energy resources. However, inefficient management could result in severe performance degradation. 

In this paper, we consider a network of n sensors tracking a single object. The sensors can be turned 
on or off at consecutive time steps and the goal is to select the subset of sensors to activate at each time 
step. This problem is challenging due to the inherent tradeoff between the value of information in the 
sensor measurements and the energy cost, combined with the combinatorial complexity of the decision 
space. 

In previous work [H, two of the authors considered approximate strategies for sensor sleeping, where 
the sensors are put to sleep to save energy and decisions are made concerning their sleep duration (in 
time slots). Once in a sleep mode, a sensor would only wake up after its own sleep timer expires. Here, 
we consider a scheduling variant of the problem which can be thought of as a sleeping problem with an 
external wake-up mechanism, i.e., sensors can be woken up by external means (e.g. a low -power wake-up 
radio). At time k, the permissible control actions for an n-sensor scheduling problem are n-dimensional 
binary vectors, i.e., vectors in {0, l} n (corresponding to set sensor nodes to activate at each time step), 

n (k) 

in contrast to vectors in N for the sleeping problem (corresponding to the sleep durations of awake 
sensors), where No is the set of non-negative integers and n a (k) the number of awake sensors at time 
k. While this does not address the combinatorial nature of the control space, the simpler structure of the 
control space for the scheduling problem enables efficient approximate solution methodologies for the 
more realistic models that we study in this paper. 

A significant body of related research work considers sensor management for tasking sensors in 
dynamically evolving environments. Castanon has developed an approximate dynamic programming 
approach for dynamic scheduling of multi-mode sensor resources for the classification of a large number of 
unknown objects. The goal is to achieve an accurate classification of each object at the end of a fixed finite 
horizon by assigning different sensor modes to different objects subject to periodic or total resource usage 
constraints. Mode allocation strategies are computed based on Lagrangian relaxation for an approximate 
optimization problem wherein sample -path resource constraints are replaced by expected value constraints. 
In the context of sensor scheduling for target tracking, information-based approaches (3), @ have been 
developed for optimizing tracking performance subject to an explicit constraint on communication costs 
in a decentralized setting. Williams et al. [3] also adopt a Lagrangian relaxation approach to solve a 
constrained dynamic program over a rolling horizon. There, the combinatorial complexity of the decision 
space is avoided by first selecting one leader node, followed by greedy sensor subset selection. Other 
related work on sensor scheduling include leader-based distributed tracking schemes |6), where at 
any time instant there is only one sensor active, namely, the leader sensor which changes dynamically 
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as a function of the object state, while the rest of the network is idle. 

While previous work focused on developing distributed implementations of efficient sensor scheduling 
strategies, our goal here is to study the fundamental theory of sensor scheduling for tracking and 
surveillance applications. Specifically, to explicitly study the fundamental tradeoff between tracking 
performance and energy expenditure, we define a unified objective function combining tracking and 
energy costs trading-off the complexity of per-stage costs to better capture the inherent energy-tracking 
tradeoff. We adopt a bottom-up approach where we consider a range of sensing, motion and cost models 
with increasing levels of difficulty and devise suboptimal scheduling policies to balance the tradeoff 
between energy expenditure and tracking performance. In some cases we are also able to derive lower 
bounds on the optimal energy-tracking tradeoff. 

Due to noise and model uncertainties, natural limitations of the measurement devices, or incomplete 
data about the surroundings, we need to design scheduling policies when the system's state is only partially 
observable to the controller. Partially-Observable Markov Decision Processes (POMDPs) provide a natural 
framework for addressing sequential decision problems where the goal is to find a policy (strategy) for 
selecting actions based on the information available to the controller while addressing both short-term and 
long-term benefits and costs. Solving POMDPs optimally is generally intractable. For example, the value 
function for a POMDP with a finite state space depends on information states consisting of conditional 
probability vectors of dimension equal to the number of states. This has led to a number of POMDP 
approximations and we refer the reader to Monahan Q and Hauskrecht |8] for excellent surveys on 
approximate methods for stochastic dynamic programming. Usually, no single approximation can be 
prescribed for all POMDPs, rather approximations can be judiciously used to exploit specific problem 
structures. In this paper, we use a subset of these approximate solution techniques, including reduced- 
uncertainty and point-based approximations [9]-[ 12]. The former assumes that more information would be 
available to the controller at future time steps, and the latter solves a reduced optimization problem based 
on a relatively small subset of sampled beliefs about the object's state. We devise different approaches to 
deal with the aforementioned computational complexity of the decision space. In one approach, instead 
of solving one large combinatorial problem, we solve a set of simpler subproblems based on the intuition 
gained from a simplistic sensing model. In another approach, we iteratively sample control actions from 
a reduced control space based on the sparsity of a reachable belief set combined with point-based value 
updates. 

The remainder of this paper is organized as follows. In Section |II1 we describe the tracking problem 
and define the sensing, transition and cost models, as well as the optimization problem, for each of the 
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considered models. In Section HIT] we describe approximate strategies to generate suboptimal scheduling 
policies. In Section [TV] we present some experimental results, and finally, in Section [V] we provide some 
concluding remarks. 



II. Scheduling Problem 

In the following we consider different models with increasing level of difficulty. Depending on the 
structure of the model, we devise approximate methods to address the associated difficulties and generate 
efficient scheduling policies. For notation, vectors are denoted by bold lower-case letters. Superscript T 
denotes transposition and the indicator function is written as 

A. Simple sensing, observation and cost models 

In this model, the network is divided into n distinct cells, one for each sensor. In other words, each cell 
corresponds to the sensing range of one particular sensor and sensors' ranges do not overlap. A Markov 
chain with an (n + 1) x (n + 1) probability transition matrix P describes the motion of the target through 
the field of interest. The extra state is for an absorbing termination state of the Markov chain which is 
reached when the object leaves the network. It is further assumed that all information about the object 
trajectory is stored at some central unit and is used to determine the scheduling actions for the different 
sensors. 

We let Uk ; g denote the action for sensor £ at time k; Uk,t = 1 if sensor I is activated at time k + 1 and 
if the decision is to turn it off. The action vector at time k, denoted Uf., is a binary vector of size n x 1, 
one decision per sensor. In this simplistic model, we assume that the target is perfectly observable within 
the cell of an awake sensor or if it reaches the terminal state r, otherwise it is unobservable. Thus, the 
observation s k at time k is defined according to: 

h, if h ^ r and u k ^ ltbk = 1; 
■-•7, - { e, if b k ^ t and u k -i,b k = 0; (1) 
r, if b k = t. 

where e stands for erasure. The observation model in £[]) induces a well-defined probabilistic observation 
model p(s k \b k ,u k _i) such that the current observation depends on that actual target location and the 
scheduling action for the n sensors. 

At each time step, the incurred cost is the sum of the energy and the tracking costs. An energy cost 
of c G (0, 1] per unit time is incurred for every active sensor and a tracking cost of 1 for each time unit 
that the object is not observed. Once state r is reached the problem terminates and no further cost is 
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incurred. In other words, r is an absorbing cost-free state; all n states are transient so that r is the only 
recurrence class of the Markov chain. Hence, 



g(b k , w fc _i) = l{b k / r} (^K_i )6fc = 0} + c Huk-i,e = 1} j (2) 
The parameter c is thus used to tradeoff energy consumption and tracking errors. 

B. Overlapping sensors with discrete observations models 

In this model, we continue to use a discrete model for the target transition but we redefine a new 
sensing model and cost structure to account for the fact that sensors could have overlapping visibility 
regions. Within that model we further consider simple and probabilistic sensing. 

1) Overlapping sensors with simple sensing: In this case, the target is perfectly observed within the 
visibility region of any active sensor. Denote by IZf the set of locations in the visibility region of sensor 
i and by £>j the set of sensors that observe location i. The observation at time k is as follows: 

b k , if b k ^ t and 3j G B bk : u k -i,j = 1; 
- \ e, if b k / t and u k - ltj = 0, Vj € B hk ; (3) 
r, if b k = t. 

Therefore, a tracking error is incurred if none of the sensors observing the current target location is 
active. Redefining the cost structure for this model: 

= Hb k / r} (l{u k - hj = 0,Vj G B hk } + fy^{u k _ l/ = (4) 

2) Overlapping sesnors with probabilistic sensing: By probabilistic sensing we account for observation 
uncertainty even if the target is within the visibility region of one or more active sensors. We assume, 

{Q, Sk = b k ; 

(5) 
l^pL, s k =i, Viell 

where 

n= n r a u R i- 

jeB bk , i£B bk , 

«fc-l,j = l M fc _i > i = l 

That is, the observation is uniformly distributed over the remaining locations (other than the true target 
location) that belong to the visibility regions of the set of awake sensors monitoring the true location 
b k . If the true target location does not belong to the visibility region of an awake sensor, we naturally 
exclude the visibility region of that sensor since no measurement is received from such a sensor. When 



72. is a singleton {b^}, we set q = 1. A tracking error is incurred if the target is not directly observed 
and the uncertainty in the target location cannot be resolved. 

C. Continuous observation, continuous state and arbitrary cost models 

In this class of models, the object sensing model allows for an arbitrary distribution for the observations 
given the current object location. Tracking cost is modeled as an arbitrary distance measure between the 
actual and the estimated object location. If we denote the set of possible object locations B, we have 
B = m + 1. Note that, in contrast to the simplistic model in III-AI m is different from n since object 
locations are arbitrary and we no longer assume one location corresponds to the sensing range of one 
particular sensor. The {m + l)-th state again corresponds to a termination state. Furthermore, the target 
can be moving on a continuous state space in which case m is oo. 

If the state space is discrete, then conditioned on the object state bk at time k, bk+i has a probability 
mass function that is given by the b^-th row of the transition matrix P. If the state space is continuous, 
P is a kernel such that P(x,y) is the probability that the next object location is in the set y C B given 
the current object location is x. For simplicity of exposition, we focus on discrete state spaces. Also, we 
omit indexing time whenever the time evolution is well-understood to avoid cumbersome notation. We 
consider the following observation model for illustration; however, our approach is fairly general: 

p(a\b, u) = fl | exp (-1 (at - (6 _ + J \l{u i = l} + 6(s i -s)l{u i = 0}\ (6) 

where s is an n x 1 continuous observation vector with the i-th entry, Sj, representing the observation of 
sensor i, pi, i = 1, . . . , n, is the position of the i-th sensor, b is the target state, and e stands for erasure. 
5{.) is the Dirac Delta function. In ((6]), the observation of an active sensor is Gaussian with a mean 
received signal strength inversely proportional to the square of the distance between the sensor and the 
actual target location. The observation of an inactive sensor is just an erasure. 

The estimated target location (given the entire history) is denoted by b. We define the tracking error 
through an arbitrary bounded distance function d(b, b) between the actual and the estimated object 
locations, which can be the Hamming distance d(b, b) = I{6 ^ b} or the Euclidean distance for discrete 
and continuous state spaces, respectively. The control at each time step is the tuple Uk). Since b does 
not affect the state evolution, the optimal value for bk is the value that minimizes the tracking cost over 
a single time step given history up to time k, i.e., 

b k = axgmmE[d(bk,b k )\I k } (7) 

b 
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where, I& denotes the information state, i.e., the total information available to the central controller at 
time k which is given by 

h = {SO, Si . . . , S k , U Q , Ui . . . , Ufe_l} 

In the case of Hamming cost, it follows that b is simply the MAP decision, i.e., b = argmax^pfc(6). 

D. Optimal scheduling policy 

The design of an optimal scheduling policy depends on the history up to time k, i.e., the information 
state Jfc. However, the posterior probability distribution, p k = Pr[6fc|ifc], of the target's state given I& is 
a sufficient statistic for this class of partially observable processes. The distribution p k , also known as 
belief, summarizes all the information needed for optimal control. The sufficient statistic itself forms a 
Markov process whose evolution can be obtained through Bayes' rule updates Q- For example, the belief 
update equation for the simplistic model in Section Hi- A I can be written as: 



where ej is a row vector with a 1 at the i-th entry and elsewhere. The vector {p k P]s is the probability 
vector formed by setting the i-th entry [p k P]i of the vector p k P to zero, Vi ^ S, and then normalizing 
the vector into a probability distribution. The set {j : Ukj = 0} signifies the set of deactivated sensors. 
In other words, the updated belief for the model in III-AI is a point mass distribution concentrated at r 
if the object exits the network, and concentrated at b k+ i if the object is observed. When the object is 
unobservable, we eliminate the probability mass at all sensors that are awake, since the object cannot be 
at these locations, and normalize. The multi-valued function in ([8]), and equivalent Bayes' updates for 
the other models, define a transformation p k+ i = 4>(Pk> s fc+i> u k)> mapping the current belief p k , the 
current control vector u k , and the future observation s k +i, to a future belief. 

The policy u k = p>k{Ik) is defined as a mapping from information states I k to control actions u k . The 
goal is to design a policy that minimizes the expected sum of costs J, where, 



'Equivalently, for a continuous state space, a sufficient statistic would be Pk(X) = Pr[6fe £ <V|7fc]. The updated belief Pk+i 
can be computed using standard Bayesian non-linear filtering as the posterior measure resulting from prior measure pP and 
observation s^+i- 




(8) 




(9) 
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J is well-defined since g is upper bounded by cn + 1 (regardless of the model) and the expected time 
till the object exits the network is finite. Note that the termination is inevitable, thus the objective is to 
reach the termination state with minimal expected cost. Hence, the scheduling policy is the solution of 
the minimization problem, 

J* = min J(/ ,/io,Mi, • • •) (10) 

This POMDP problem falls within the class of infinite horizon stochastic shortest path problems. 
Noting that the termination state is observable, cost-free and absorbing, and that every policy is propec, 
a stationary policy i.e., one which does not depend on k, is optimal in the class of all history- 

dependent policies and p k is a sufficient statistic for control lfT3Tl . i.e., u* k = fj,*(p k ), is defined through 
a time-invariant mapping from the belief space to the action space. J can be written in terms of the 
sufficient statistic and the optimal policy can be obtained from the solution of the Bellman equation: 

J(jp)= min E[g(b',u)\p,u]+y^p(s\p,u)J((j)(p,s,u)) (11) 

such that J(e T ) = 0, where J(.) is the value function for the POMDP, and the expectation is taken 
over the future state b' which is distributed according to pP. Note that we removed the time dependence 
due to the aforementioned time invariance property. For continuous observations, summation over s is 
replaced by an integration. 

III. Approximate Solutions and Lower bounds 

There are a number of algorithms for solving POMDPs exactly |[!4l - lfl6ll . These algorithms rely on the 
powerful result of Sondik that the optimal value function for any POMDP can be approximated arbitrarily 
closely using a set of hyper-planes (o-vectors) defined over the belief simplex [14]. This fact is the basis 
for exact value iteration based algorithms, such as the Witness algorithm ifTTl for computing the value 
function. The result is a value function parameterized by a number of hyper-planes (or vectors) whereby 
the belief space is partitioned into a finite number of regions. Each vector minimizes the value function 
over a certain region of the belief space and has a control action associated with it, which is the optimal 
control for beliefs in its region. 

To clarify, in value iteration we generally start with some initial estimate for J* and repeatedly apply 
the transformation defined by the right hand side of Bellman equation (fTTT> until the sequence of cost 

2 A proper policy is a policy that leads to the termination state with probability one regardless of the initial state. In our 
problem, the scheduling policy does not affect the target motion and all policies are proper in the sense that there is a positive 
probability that the target will reach the termination state after a finite number of stages. 
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functions converges. Let {aj }j =1 denote the set of vectors parameterizing the value function 
after k iterations, where | J^\ is the total number of hyper-planes, and oq (b), which is a hyperplane 
in the belief space, represents the value of executing the /c-step policy associated with the i-th vector 
starting from a state b. Hence, the value of executing the z-th hyperplane policy starting from a belief 
state p is simply the dot product of Qj and p: 

b 

Therefore, the value of the optimal fc-step policy starting at p is simply the minimum dot product over 
all hyperplanes, i.e., 

J*W(p) = min p-af ] . 

Hence, J*^ k \p) is piecewise linear and concave. Some of the vectors (also known as policy trees) 
may be dominated by others in the sense that they are not optimal at any region in the belief simplex. 
Thus, many exact algorithms devise pruning mechanisms whereby a parsimonious representation with a 
minimal set of non-dominated hyper-planes is maintained Q. 

Even though the aforementioned linearity/concavity property makes the policy search a great deal 
simpler, the exact computation is generally intractable except for relatively small problems. The two 
major difficulties for exact computation arise from the exponential growth of the vectors with the planning 
horizon and with the number of observations, and the inefficiencies related to identification of such vectors 
and subsequently pruning them. Namely, the number of hyper-planes grows double exponentially such 
that after k steps the number of hyperplanes is O (\U\\ S ^, where \U\ and \S\ denote the cardinality of 
the control and observation spaces, respectively. Equivalently, the number of hyperplanes per iteration 
grows as: 

|j(fc+i)| = o(V||J (fc) | |51 ) . 

This has led to a number of approximations and suboptimal solutions techniques trading off solution 
quality for speed. 

Remark III.l. The intractability of the optimal solution for our problem is primarily due to the following 
reasons: 

(i) The cost function is minimized over the simplex of probability distributions, i.e., the {m — 1)- 
dimensional belief simplex for m-state discrete state-space models, and the space of probability 
density functions for continuous state-space models. 
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(ii) The exponential explosion of the action space with the number of sensors (2 n actions). 
( Hi) The exponential growth of the a-vectors with the planning horizon and with the number of obser- 
vations, especially for continuous observation models. 

A. Approximate solutions 

In this section, we outline our approximate solution methodologies for the different models introduced 
in Section ITT] First, we consider approximations where it is assumed that more information becomes 
available to the controller at future time steps. Policies based on the assumption that uncertainty in the 
current belief state will be gone after the next action were first introduced within the artificial intelligence 
community and known as Q MDP policies ifTOl , ifTTl . We show that under an observable-after-control 
assumption, our sensor scheduling problem decomposes into n simpler subproblems, one subproblem 
per sensor, for the simplistic model of III-AI These subproblems can then be solved exactly using policy 
iteration |fl3l . Furthermore, in this case, the Q MDP solution gives us a lower bound on the optimal 
tracking-energy tradeoff. Unfortunately, this natural decomposition does not extend to the other class of 
models due to the inherent coupling of their tracking errors. However, based on intuition gained from 
the simplistic model, we artificially decouple the scheduling problem for those models and individually 
learn the tracking costs corresponding to each subproblem under the aforementioned Q MDP assumption. 
This approach combines Q MDP with reinforcement learning lfl"8l . 

Second, we develop sensor scheduling strategies based on point-based approximations. Despite the 
fact that the generated Q MDP based policies perform reasonably well, generally the resulting policies 
would not take actions to gain information (an effect of the observable-after-control assumption), leading 
to situations wherein the belief state does not get updated appropriately. Furthermore, while decoupling 
the scheduling problem provides close-to optimal performance for uncoupled or lightly-coupled sensing 
and tracking models (see Section ITVl ). it might come at the expense of reduction in solution quality for 
more realistic or heavily-coupled models. To that end, we develop point-based approximate scheduling 
policies. While our previous approach reduced complexity via decoupling and learning, the key idea here 
is to optimize the value function only for a small set of reachable beliefs V and not over the entire 
belief simplex. Point-based methods have shown great potential for solving large scale POMDPs mostly 
for robotic applications (8), O, ifTTTl . |[T9l . Pineau et al. (9) proposed point-based value iteration (PBVI) 
which performs point-based backups only at a discrete set of reachable belief points, that can be actually 
encountered by interacting with the environment. Developing a class of point-based algorithms, which 
mostly differ in the way the subset of belief points is chosen and the execution order of the backup 
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Fig. 1: Structure of the point-based scheduling approximation 



operations over the selected belief points, has been the focus of recent algorithm-development research 
targeting large scale POMDPs. Perseus LHJ is one such randomized point-based algorithm that maintains 
a fixed set of belief points. There, backup speedups can be obtained by exploiting the key observation 
that a single backup may improve the value of many belief points simultaneously. These algorithms were 
designed to deal with large state spaces, yet, two extra difficulties in the scheduling problem arise from 
the size of the action space 2 n (for all models) and the observation space (for the models in Sections [il-Cb . 
Regarding the dimensionality of the action space, we devise a strategy to sample actions based on the 
support of the beliefs and the sparse structure of the transition models. Intuitively speaking, an object can 
only move from one side of the network to the other side within time constraints rendering exponentially 
many scheduling actions irrational at certain times. Hence, instead of performing full updates including 
2 n actions, we perform the minimization over a reduced control space U(p) for every p G V (see Section 
IIII-C lb - When dealing with continuous or large observations, we combine that with a methodology that 
aggregates observations and uses aggregate observations for value iteration updates (Section IIII-C2I ). At 
the core of the algorithm we use Perseus ifTTil . a variant of PBVI J9j, whereby value iteration updates 
are not carried out for every sampled belief. Instead, the values for many belief points are improved 
simultaneously in one update. Fig. Q] depicts the structure of our point-based approximation, combining 
control space reduction and observation aggregation with point-based updates. 

B. Qmdp based scheduling policies 

Next, we consider our first class of policies based on the Q MDP reduced future uncertainty assumption. 
First, we consider the simplistic model in Section III- A I then we use the intuition we developed from this 
model to devise similar policies for the other models. Since the POMDP is a stochastic shortest path 
problem with an absorbing cost-free termination state, and the expected termination time is finite, the 
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cost-to-go function for a given belief can be written as the minimum of the dot product of the belief 
vector and a set of hyper-planes (a vectors): 



J(p) = min Va;(fe)p(6) 

{n / n ^ 

yippji i{ui = 0} + v d{u e = 1} 
f-1 \ 



i=l \ t=\ 



+ E ^X>(*M)X>(tf|&M&)ai(^)[ (12) 

se{l...n,e} ft' b ) 

where {oti} is the set of hyperplanes constituting the value function J. In essence, the complexity of the 
Bellman equation (fT2l stems from the evolution of the belief p k in ([8]). We can see why (fT2l is hard 
to analyze if we further divide the second term in the summation into two terms depending on whether 
there is observability or there is an erasure, 



J(p) = min J 1^ = 0} + V cl{u t = 1} 

ite{o,i}™ \ \ f— ' 



+ Vl{u 6 , = ^[pPl^minct^&O + minVlH' = 0}[pP] yOi(6 / ) I. (13) 
To further clarify we observe that: 

n n 

Y J P(s\u,p)J( Pl ) = £ = l}[pP] f J(ci) + X) I{«4 = 0}[pP] i J([pP] {i:u . =o} ) (14) 

S 1 = 1 1=1 

and the minimization problem is coupled across the sensors as the second term in ([141 . which is due 
to non-observability, depends on the action vector u. The action of one sensor affects belief evolution 
therefore coupling the problem across sensors. Now, if we make the assumption that perfect observations 
would be available to the controller after taking a scheduling action, we obtain an approximate surrogate 
function which can be used to generate a suboptimal scheduling policy. Namely, we replace p(s\u, b') = 
S(s - b') in O- We get 

J(p) = mm I Y}pP]i U{ Ul = 0} + J2cl{ut = 1} J + J2[pP]b> min a, • e v \ 



t=l \ i=\ 

n / n 



= min <^ yW]j l{ui = 0} + Vc% = 1} + V[pP] y J(cy) \. (15) 

The terms in the summation in (fT3T ) only depend on the control action for each sensor. Furthermore, 
the belief evolution is independent of the scheduling action, wherefore the approximate recursion in (031) 
decomposes into separable terms, one per sensor. Hence, the value function and the scheduling policy for 
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sensor I, under the observable-after-control assumption, can be obtained from the solution of per-sensor 
Bellman equation: 



jW(p)= min y;[pP]Jl{ Ui = 0} + VclK = l} )+y2\pP] b ,J^(e b ,)\. (16) 
The POMDP problem is now decomposed into n separate simpler subproblems such that the total cost 



function is the sum of the per-sensor cost function while the overall scheduling policy is the per-sensor 
policies applied in parallel. Each subproblem can be easily solved using standard policy iteration lfT3l 
with a simple minimization over a binary control action. 

Fundamentally, for the simplistic model, we were able to decompose the problem into n simpler 
subproblems due to the separability of the tracking cost into per-sensor costs. Note that the problem is 
still coupled due to the belief evolution in ([8]> yet that coupling is resolved under the observable-after- 
control assumption. 

While separability holds for the simplistic model, this is not the case for the other models. Hence, we 
devise a strategy where we artificially decouple the problem into n simpler subproblems. To this end, 
we perform Monte Carlo simulations to determine appropriate values for the per-sensor tracking cost 
corresponding to each subproblem. For example, consider the continuous observation model of Section 
Ill-Ci For simplicity of exposition, assume a discrete state space model with m possible object locations. 
In this case, we define a surrogate value function for the £-th subproblem as follows: 

{m m m \ 

l{u = 0}^p(i)T(iJ) + l{u = l}Y / c[ P P] i + J2\P p }iJ e ( e i)\ £ = l,...,n (17) 
i=l i=l i=l ) 

where T(i,£) captures the contribution of the ^-th sensor to the total tracking error when the target's 
previous state is i and is obtained via Monte Carlo simulations. Namely, the expected tracking cost can 
be evaluated by repeatedly simulating our system from time k — 1 to time k while changing the state of 
the ^-th sensor. Similarly, (fTTT ) can be generalized for continuous state spaces. 

Even though the Q MDP assumption leads to a separable problem and provides a lower bound on the 
optimal energy-tracking tradeoff for the simplistic model as we elaborate in Section IIII-DI the resulting 
scheduling policies are myopic, unlike the sleeping policies in |T]. This follows from the fact that under an 
observable-after-control assumption, the future cost term is independent of the control vector u. Therefore, 
we consider more efficient, albeit more difficult, point-based approximations in the next section. 

C. Point-based approximate policies 

In the previous section, we described Q MDP based policies, whereby issues (0) and (ITiTb in Remark [III. 1 1 
are resolved since we only needed to solve the underlying Markov Decision Process to describe the full 
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approximate surrogate function. Decoupling the problem into one-per-sensor subproblems (naturally or 
artificially) further enabled us to address issue ((n]). Yet, we just argued in Sections [ill- Al and IIII-B I that the 
resulting scheduling policies are myopic and generally do not take control actions to gain information. 

To that end, we develop point-based approximate scheduling policies. Instead of reducing complexity 
via artificial decoupling and learning, the key idea here is to optimize the value function only for 
specific reachable sampled beliefs and not over the entire belief simplex (addressing issue (0) in Remark 
IHI. lb . Such techniques have shown great potential for solving large scale POMDPs while significantly 
reducing complexity. Due to the large size of the control space, we also devise strategies to sample 
actions exploiting the sparsity of the beliefs and the problem structure (to address issue ([n}). Moreover, 
observation aggregation is used for continuous observation models. Furthermore, since Perseus updates 
are not carried out for every sampled belief and multiple belief points are improved simultaneously, the 
number of a vectors grows modestly with the number of iterations. This addresses issue (ITiTb in Remark 

mm 

For completeness we first briefly outline the steps of Perseus and refer the reader to ifTTTl . lfT2l for 
further details. Later, we discuss specific variations to the algorithm to address the dimensionality of the 
action and the observation spaces. 
One iteration of Perseus 

1) Sample a set of belief points V. We obtain these beliefs by simulating the target motion through 
the field taking random actions and generating observation according to the observation models in 
CD, ©, ©, and © 

2) Sample a belief point p G V at random and compute the backup using dl8a| ) and d 1 8bb . 

a = arg min p ■ (18a) 

where 

a u = d(b,u) + p(s\u, p) min cj)(p, u, s) • af (18b) 

S 

3) If Y, b P( b ) a ( b ) ^ J {k) (p) then add new a to j( k+1 ) otherwise keep old hyperplane 

4) If {p G V : Jfc + i(p) > j( k \p)} = 0, i.e., the empty set, iteration is complete otherwise repeat from 
step Q] 

Figf2] illustrates the progress of one iteration of Perseus. The x-axis represents the belief space with 
circles representing the sampled belief set V = {pi, • • • ,Pr}. The y-axis is the value function at 
consecutive iterations, i.e. J k ^ 1 (solid lines) and J k (dashed lines). The figure displays the a vectors and 
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Fig. 2: One iteration of Perseus illustrating the progress of the algorithm. The x-axis represents the 
belief space with circles representing the sampled belief set V = {pi, . . . ,pr}. The y-axis is the value 
function at consecutive iterations, i.e. J k ~ l and J k . Solid lines represent the hyper-planes in the {k— l)-th 
iteration and dashed lines represent the newly added hyper-planes during the /c-th iteration, (a) The initial 
value function J k ^ 1 ; (b) p\ is randomly selected and a new a vector is added to J k . This update step 
only happens to improve p\. Dark circles represent belief points which did not yet improve; (c) p% is 
sampled and a new hyperplane is added which improves the value for P2 through p§; (d) Only pi did 
not improve, thus pi is sampled and a new hyperplane is added to J^; (e) All belief points improved, 
is computed, the iteration ends. 



different steps illustrating the progress of the algorithm. The algorithm selects a belief point at random 
and updates the value function for that belief. Then a new update is carried out for a belief point randomly 
selected from the set of remaining beliefs, i.e., beliefs which did not improve in the previous step. The 
algorithm repeats till all belief points are updated. Solid lines represent the hyper-planes in the (k— l)-th 
iteration and dashed lines represent the newly added hyper-planes during the A;-th iteration. In a way, the 
Perseus updates in POMDPs are the counterpart of asynchronous dynamic programming for MDPs |fT3l 
since the order of backup of the belief points is arbitrary and does not require full sweeps over the entire 
sampled belief set. 



15 



1 ) Sampling actions based on the support of the belief: Note that the update equation (fT8T ) involves 
a minimization over all control actions in \U\. Even though one iteration of the algorithm is linear in 
the cardinality \U\ of the control space, \U\ itself is exponential in the number of sensors rendering the 
minimization infeasible for a relatively large sensor network. 

The idea here is to exploit the structure of the scheduling/tracking problem. Since the target transition 
model is naturally sparse, we predict relatively small uncertainty regions for the target state at future time 
steps. More specifically, for every belief point in V, we use prior information about the target transition 
model to project the future state of the target. This is particularly useful when the current belief vector 
is sparse leading to more restricted uncertainty regions. Subsequently, we restrict our attention to a 
significant subset of sensors, that is, sensors of relevance to the particulars of the uncertainty region. 
Hence, we only consider scheduling actions involving scheduling different combinations of a reduced 
number of sensors which considerably reduces the control space for every belief in V. If the number 
of significant sensors is still large, we randomly sample actions from the reduced control space. Note 
that the same intuition extends to more complex motion models wherein information about target speed, 
maneuver, and acceleration can be factored in to define the future uncertainty regions. Hence, instead of 
performing full updates including 2 n actions, we perform the minimization over a reduced control space 
for every p £ V. Specifically, we redefine the point update equation as: 

a = arg min p ■ a.^ (19) 

{«S}ue«( P ) 

where U{p) designates the reduced control space for the belief vector p. 

Note that, future iterations of the algorithm involving a particular belief point, ensure sufficient sampling 
to relevant control actions in the reduced control space. This approach is well suited to Perseus wherein 
the value for every belief point is guaranteed to improve over consecutive stages of the algorithm. It 
is worth mentioning that the observation and the cost models need to be computed on the fly for each 
sampled control action during the algorithm implementation. 

2) Observation aggregation: The point update equation (TT8T ) involves back-projecting all hyper-planes 
in the current iteration one step from the future and returning the vector that minimizes the value of 
the belief. Since this involves computing a cross sum by enumerating all possible combinations of 
alpha vectors for the different observations, a number of vectors which is exponential in the number 
of the observations is generated at each stage. The recursion has to be redefined to address continuous 
observation models. Looking carefully at (fT8l) . it is not hard to see that if different observations map 
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to the same minimizing hyperplane, then they can be aggregated EDI . Hence, if we can partition the 
observation space into regions that map to the same hyperplane (possibly non contiguous), the continuous 
model is reduced to a corresponding discrete model. Integration is replaced by a summation over these 
partitions and the weighing probabilities are obtained by integrating the conditional density over these 
partitions. This is clarified in the following: 



jmaJ2p{a\u,br)J2pQ/\b)p(b)ai{br) ds = ]T / J>( S |u, b') £p(&'|&)p(&)a i (&') ds 

"* b' b j b' b 

j b> 

To find the regions of aggregate observations, we need to solve for the boundaries, i.e., for each pair 
of a vectors we need to solve for s: 

ati- <j>(p,u,s) = cx.j ■ (j)(p,u,s) (21) 

where cf)(p,u,s) = p 1 (b') oc ^2 b p(b)p(s\b' ,u)p(b'\b) 
Hence, we need to solve: 

^(« i (6')-a,(6'))[pP] 6 'e X p{-i £ (s t - _ 10 A = (22) 

b< I i: Ui = \ ^ ) 

After solving for the boundaries, we can readily define the regions: 

Sj* = {s\j* = argmaxaj ■ 4>(p, u,s)} (23) 
j 

Now the update step is simply: 

J(p) = g(p,u*) + ^^[pP], Pr[«S>*, &']<*#') (24) 

j b' 

where 



Pr[Sj\u*,b'] = [ p(s\u*,b')ds. 



Finding a closed form analytical solution for (l22l is not feasible. Instead, we use Monte-Carlo simulations 
to solve for the boundaries and get estimates of the weighing probabilities by sampling observations from 
p(s\u, b') for different combinations of actions and target states. 
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D. Lower bounds 

We are able to derive lower bounds on the energy-tracking tradeoff for the simple as well as the 
continuous Gaussian observation models. For the simple model, the Q MDP value function is itself a lower 
bound on the expected total cost since more information is available to the controller at future time steps 
given the reduced uncertainty assumption. To further clarify, observe that if we interchange the order of 
minimization and summation in the last term of (fT3l) . we obtain a lower bound on the optimal cost to go 
function. Hence, a lower bound can be obtained from the solution of the following equation: 



J(p) = min \ Y,lP P ]i H u i = °} + E = X > 



ue{o,i} 



+ V l{u v = l}]pP] v min + V l{u v = 0}[pP] 6 , min £*;(*/) 



b 
n 



min <^ J2\pP]i Hui = 0} + V cl{u e = 1} + V[pP] 6 - min a; • e 6 - 
^{0,1}" [ ^ V / V 



n I n 



mm 



i=l V £=1 lb 1 ) 

Interchanging the order of the summation and minimization corresponds to a fully observable state after 
the next scheduling action, i.e., that the future belief is e#. Hence, the Q MDP value function is a lower 
bound on the cost function of the original problem. 

Unfortunately, this is only true for the simplistic model and does not extend to the coupled models 
since the factored tracking cost in (fT71) need not be a lower bound on the true tracking cost. 

To obtain a lower bound on the optimal energy-tracking tradeoff for such models, we combine the 
observable-after-control assumption with a decomposable lower bound on the tracking cost which we 
derive next. Consider the continuous observation model with discrete state space. Given the current 
belief p k and a control vector the expected tracking cost can be written as: 

m 

E[d(b k+1 ,b k+1 )\p k ,u k ] = ^Pr[6 fc+ i ^ j\p k , u k , b k+x = j]Pr[6 fe+ i = j\p k ,u k ] 

m m 

= ^Pkitf^Pih+l = j\h = «)Pr[6 fc+ i ^ j\p k ,u k ,b k+1 = j] 
i=l j=l 

(26) 

Defining 

P{E\Hj) 4 p r [6 fe+1 ^ j\p k ,u kl b k+1 = j] 
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which is a conditional error probability for a multiple hypothesis testing problem with m hypotheses, 
each corresponding to a different mean vector contaminated with white Gaussian noise. Conditioned on 
Hj the observation model is: 

Hj : s{£) = (mj(£) + w(£))l{u k>e = 0} + el{u k/ > 0} (27) 

where s(l) is the i-th entry of an n x 1 vector s denoting the received signal strength at the n sensors, rrij 
is the mean received signal strength when the target is at state j (j-th hypothesis), and to is a zero mean 
white Gaussian Noise, i.e. w ~ AA(0, a 2 1). According to (l27l) . sensor I gets a Gaussian observation, 
which depends on the future target location, if activated at the next time step, and an erasure, otherwise. 
Since the current belief is p k , the prior for the j-th hypothesis is -Kj = [p k P]j. The error event E can 
be written as the union of pairwise error regions as 

p(E\Hj) = PrlUfc^-Oy] (28) 

where 

(kj = {s ■ L kj (s) > -?-} 

is the region of observations for which the k-th hypothesis H k is more likely than the j-th hypothesis 



Hj and where 



T a f( s \ H k) 



f(s\Hj) 

denotes the likelihood ratio for and Hj. Using standard analysis for likelihood ratio tests Ell, 11221 . 
it is not difficult to show that: 



p(C kj \Hj) = Q \ ^ + } (29) 



where, djL = ATn,fc ^ mfc3 ) Am^ = m k — rrij, and Q(.) is the normal distribution Q-function. The 
quantity d k j plays the role of distance between the two hypothesis and hence depends on the difference 
of their corresponding mean vectors and the noise variance a 2 . Note that, for different values of k and 
j, Q k j are not generally disjoint but allow us to lower bound the error probability in terms of pairwise 
error probabilities, namely, a lower bound can be written as: 

p{E\Hj) > maxp(( k j\Hj). (30) 
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And we can readily lower bound the expected tracking error: 

m m 

E[d(b k+1 ,bk+i)\p k ,Uk] > ^p k (i)^p(b k+1 = j\b k = i)m&xp(( kj \Hj) 

i=i j=i 

m m / , j n nj_ \ 

= ^Pk(i)^p(h+i=j\bk = i)^Q[^ + -f s -\ (3D 

i=l j=l ^ V ^ / 

Next we separate out the effect of each sensor on the tracking error: 

E[d(b k+ i,b k+1 )\p k ,u k ] > l{u k j = l}E[d(b k+1 ,b k+1 )\p k ,u k = l] 

+ Hu kj£ = 0}E[d(b k+1 ,b k+1 )\p k ,u k)i = Q \fi ^ £} for every £ 

(32) 

where 1 is the vector of all ones designating that all sensors will be active at the next time slot. The 
inequality in (l32l follows from the fact that if we separate out the effect of the £-th sensor we get a better 
tracking performance when all the remaining sensors are awake. Since this holds for every I, a lower 
bound on the expected tracking error can be written as a convex combination of all sensors contributions: 

n 

E[d(b k+1 ,b k+1 )\p k ,u k ] > ^2 MPk){H u k,e = l}E[d(b k+1 ,b k+ i)\p k ,u k = 1] 

+ l{u k>i = 0}E[d(b k+1 , b k+1 )\p k ,u k ,i = Vi^0} (33) 

where, J2 e \e(p k ) = 1. 

Let 1_£ denote a vector of length n with all entries equal to one except for the £-th entry being zero. 
Then replacing from (f3TT >. 



E[d(b k+1 ,b k+1 )\p k ,u k ] > 



n ( m m / r/ f "H In — \ 

£ X e ( Pk )^l{u k/ = l}^ft(i) J2p(b k+1 = j\b k = i) maxQ \J^1 + 



in 



+l{u k/ > 0}J2Pk(i)Y,P(h+i=j\b k = OmaxQ \ ^ kj ^ l) + ^f^y j j (34) 
To simplify notation, we define the following 2 quantities: 

Up; i, £) 4 £>(& fe+1 = = i) maxQ + 

m /, ri ^ i n [P p ]j 



III a 
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Intuitively, T\(p;i,l) represents the contribution of sensor I to the total expected tracking cost when the 
underlying state is i, the belief is p, and when all sensors are awake. On the other hand T(p; i,£) is the 
^-th sensor contribution when it is inactive and all the other sensors are awake. 

Now if we assume that the target will be perfectly observable after taking the scheduling action, a lower 
bound on the total cost can be readily obtained from the solution of the following Bellman equation: 

J(p) = ^J^(p) (35) 

t 

where 

jW(p)= min U{ut = l}\Y i P$)W x tebM+c^\pP\A 



i=l 
m 



+ l{u t = 0}^p(6)A/T(p;M) + ^b^ W (>i) W 

b i=l / 

Note that if we can solve the equation above for p = ej for all i € {1, . . . , m}, then it is straightforward 
to find the solution for all other values of p. We therefore focus on specifying the value function at those 
points. Since this is the case, we further simplify our notation and use T(i,£) and X(i,£) as shorthand 
for T(ej; i,£) and A^(ej), respectively. We can see that a lower bound on the value function of sensor I 
can be obtained as a solution to the following minimization problem over u: 

{m \ m 

X(b,e)T(b,£);X(b,£)T 1 (b,e) + c^2[e b P] i I + J> 6 P]; J^(e { ) (37) 
i=l J i=l 

Equation 071 ) together with d35l ) define a lower bound on the total expected cost. To further tighten 
the bound we can now optimize over a matrix A for every value of c, where A(c) is an m x n matrix 
with the (i,£) entry equal to X(i,£), i.e., A(c) = {X(i,£)}. Hence 

n / ( m \ m \ 

J(c 6 ) =max^ min J X(b,£)T(b,£); A(M)?i(M) + c^[e 6 P]; I + J> fe P]; jW(e<) (38) 

A(c) £=i VI i=i ) i=i ) 

subject to Al n = l m 
where l m is a column vector of all ones of length m. 

The inner recursion can be solved to obtain a closed form solution for J^(e&) as: 

oo m m 

J '( e ") = ^^™™{[e b pi} l X(i,£)T 1 (i,£)+cY J [ebP J+1 }k ; [e 6 P^A(^)T(M)} (39) 

i=0 i=l fc=l 

Since the problem is only constrained across the different sensors, we obtain a lower bound from the 
solution of the following optimization problem, 

m n oo / m \ 

VmaxVV^.miii A(i,£)Ti(i,^ + cVfeP] fc ; A(M)T(»,£) (40) 
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subject to 

n 

53 AM = 1 Vi = l,...,m. 

1=1 

We observe that for every i we are maximizing a concave piecewise linear function in £). We pose 
an equivalent convex optimization problem by realizing that the minimum of a set of concave functions 
is also concave. Since affine functions are concave, we can apply the technique here. Since the problem 
is unconstrained across the i dimension we focus on solving the max-min problem for a fixed i. The 
final solution can then be obtained by summing the objective function for m subproblems. 

For each £ = l,...,n add a variable ti to the optimization problem. Also for every £ append 2 
constraints to the optimization problem. The constraints state the minimization over U£ implicitly, by 
requiring that \(i,£)T\(i, £) + cY^ = i[^iP]k — te an d \(i,£)T(i,£) > tg. The modified problem is 
therefore: 

n 

max\mize X ( i: e) : t e ;£=i,...,n 

n 

subject to y^X(i,i)<l, 

ti (4i) 

m 

X(i,£)T 1 (i,£) + cY / [e l P} k >t e , 

k=l 

\(i,£)T(i,£) >U, £ = l,...,n. 

which can be readily solved using standard convex optimization techniques l23l . 

IV. Results and Simulations 

In this section, we show experimental results illustrating the performance of the proposed scheduling 
policies for the different models considered in this paper. In each simulation run, the object was initially 
placed at the center of the network and the simulation run concluded when the object reached the 
absorbing state r. We perform Monte Carlo runs to compute the average tracking and energy costs for 
different values of the energy parameter c. For the planning phase in case of point-based policies, beliefs 
are sampled by simulating multiple object trajectories through the sensor network. Each trajectory starts 
from a random state sampled from the initial belief, picking actions at random, until the target leaves 
the network. 

First, we consider the simple model in Section III-AI with a linear network of 41 sensors. Figure [3ta) 
shows the tradeoff curve between the number of active sensors per unit time and the tracking error 
per unit time using the point-based and the Q MDP policies. The figure also shows a lower bound on 
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Fig. 3: Simplistic model 
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Fig. 4: A sensor network with overlapping sensing ranges (12 sensors and 20 object locations). An edge 
connects a sensor to a given location if this location falls within the sensing range of that sensor. 



the optimal performance (see Section MI-PI) . It is clear that both policies lead to tradeoffs that closely 
approach the lower bound. The Q MDP policy gets even closer to the lower bound at small tracking errors 
since the observable-after-control assumption is more meaningful in this regime. In Fig. we show 
convergence results for the point-based algorithm with reduced control space minimization. The top left 
subplot displays the convergence of the sum cost of all the belief points in V; the top right shows the 
expected cost averaged over many trajectories; the bottom left subplot shows the number of hyper-planes 
constituting the value function as a function of time; the bottom right subplot shows the number of 
policy changes versus time, i.e., the number of belief points for which the optimal action changed over 
2 consecutive iterations of the algorithm. 

Figure [5] displays the tradeoff curves for the network in Fig. [4] with a probabilistic observation model. 
The network is composed of 12 sensors and 20 object locations with the shown connectivity such that the 
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Fig. 5: Overlap model 



observation range for the different sensors overlap. Since the tracking error for this model is inherently 
coupled across sensors, the global point-based policy clearly outperforms the learning-based Q MDP policy. 

Next, we consider a network of 10 sensors where object locations are located on integers from 1 to 
21. The observation for each sensor is continuous as in ((6]). For every object state and every scheduling 
action in the reduced control space, we sample 50 observations to construct estimates of the weight 
probabilities and compute the aggregate observation boundaries. Up to 32 actions are sampled from the 
reduced control space. In this setup, the belief set consists of 500 sampled belief vectors and we assume a 
Hamming error cost. Fig. [6] shows the performance of the different policies for the continuous observation 
model. It is shown that the point-based scheduling policy outperforms the Q MDP policy. We further show 
a lower bound on the optimal performance tradeoff. The lower bound is loose especially in the high 
tracking error regime since the derived bound on per-sensor tracking errors assumes all other sensors are 
awake. However, we can exactly compute the saturation point for the optimal scheduling policy since 
every policy has to eventually meet the all-asleep performance curve, shown in Fig. [6^, when the energy 
cost per sensor is high. At that point, all sensors are inactive and hence the target estimate can only be 
based on prior information. 

V. Conclusions 

In this paper we studied the problem of tracking an object moving randomly through a dense network 
of wireless sensors. We devised approximate strategies for scheduling the sensors to optimize the tradeoff 
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Total cost versus c (continuous observation model) 




between tracking performance and energy consumption for a wide range of models. First, we proposed 
policies that rely on an observable-after-control assumption (Q M dp policies). Key to this solution is 
the decoupling of the optimization problem into per-sensor subproblems combined with simulation- 
based learning of individual tracking costs for each subproblem. Second, we developed point-based 
sensor scheduling strategies which optimize the value function over a small set of reachable beliefs 
within the belief simplex. Based on the belief support and the sparsity of the transition models, we 
developed a methodology to sample actions from reduced control spaces. This was combined with 
observation aggregation to address the complexity of the observation space for continuous observations 
models. In some cases we derived lower bounds on the optimal tradeoff curves. While being suboptimal, 
the generated scheduling policies often provide close-to-optimal energy-tracking tradeoffs. Developing 
distributed scheduling strategies when no central controller is available is an area for future research. 
Another interesting challenge is when the statistics for object movement are unknown or partially known. 
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